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Abstract 

Suppose we are given a vector / in a class T C K , e.g. a class of digital signals 
or digital images. How many linear measurements do we need to make about / to be 
able to recover / to within precision e in the Euclidean (£2) metric? 

This paper shows that if the objects of interest are sparse in a fixed basis or com- 
pressible, then it is possible to reconstruct / to within very high accuracy from a small 
number of random measurements by solving a simple linear program. More precisely, 
suppose that the nth largest entry of the vector |/| (or of its coefficients in a fixed basis) 
obeys \f\( n ) < R ■ n^ 1 ^, where R > and p > 0. Suppose that we take measurements 
Vk = (/, Xk), k = 1, . . . , K, where the Xk are TV-dimensional Gaussian vectors with in- 
dependent standard normal entries. Then for each / obeying the decay estimate above 
for some < p < 1 and with overwhelming probability, our reconstruction defined 
as the solution to the constraints yk = (f^,Xk) with minimal t\ norm, obeys 

II/- /"Ik <C p -R-(K/logN)- r , r = l/p-l/2. 

There is a sense in which this result is optimal; it is generally impossible to obtain 
a higher accuracy from any set of K measurements whatsoever. The methodology 
extends to various other random measurement ensembles; for example, we show that 
similar results hold if one observes few randomly sampled Fourier coefficients of /. In 
fact, the results are quite general and require only two hypotheses on the measurement 
ensemble which are detailed. 
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1 Introduction and Overview of the Main Results 



This paper considers the fundamental problem of recovering a finite signal / 6 M. N from a 
limited set of measurements. Specifically, given a class of signals T C R , one is interested 
in the minimum number of linear measurements one has to make to be able to reconstruct 
objects from T to within a fixed accuracy e, say, in the usual Euclidean ^-distance. In 
other words, how can one specify K = K(e) linear functionals 

vk = (fM, ken, (i.i) 

where {ipkjk^o, is a set of vectors with cardinality = K, so that it is possible to recon- 
struct an object p from the data (yfc)fceo obeying 

\\f-f% 2 <e, (1-2) 

for each element / taken from J 7 ? The primary goal is of course, to find appropriate 
functionals (ipk)ken so that the required number K of measurements is as small as possible. 
In addition, we are also interested in concrete and practical recovery algorithms. 

The new results in this paper will address this type of question for signals / whose coeffi- 
cients with respect to a fixed reference basis obey a power-law type decay condition, and 
for random measurements (yk)kett sampled from a specified ensemble. However, before we 
discuss these results, we first recall some earlier results concerning signals of small support. 
(See also Sections 11.71 and 19.21 for a more extensive discussion of related results.) 



1.1 Exact Reconstruction of Sparse Signals 

In a previous article [TT] . the authors together with J. Romberg studied the recovery of 
sparse signals from limited measurements; i.e. of signals which have relatively few nonzero 
terms or whose coefficients in some fixed basis have relatively few nonzero entries. This 
paper discussed some surprising phenomena, and we now review a special instance of those. 
In order to do so, we first need to introduce the discrete Fourier transform which is given 
by the usual formula 1 

:= :/f E f{ty-^ kt/N , (i.3) 

where the frequency index k ranges over the set Zat := {0, 1, . . . , N — 1}. 

Suppose then that we wish to recover a signal / 6 R w made out of \T\ spikes, where the 
set T denotes the support of the signal 

T:={t, f(t)^0}. 

We do not know where the spikes are located nor do we know their amplitudes. However, we 
are given information about / in the form of 'only' K randomly sampled Fourier coefficients 
Fuf '■= (f(k))keCi where f2 is a random set of K frequencies sampled uniformly at random. 

1 Strictly speaking, the Fourier transform is associated to an orthonormal basis in rather than R . 
However all of our analysis here extends easily to complex signals instead of real signals (except for some 
negligible changes in the absolute constants C). For ease of exposition we shall focus primarily on real- valued 
signals / £ M", except when referring to the Fourier basis. 
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In it was shown that / could be reconstructed exactly from these data provided that 
the expected number of frequency samples obeyed the lower bound 

\T\ <a- (K/logN) (1.4) 

for all sufficiently small a > (i.e. a < oq for some small absolute constant ao). To recover 
/ from Fftf, we simply minimize the ^i-norm of the reconstructed signal 

mm lls , lk : = Yl ( L5 ) 

tez N 

subject to the constraints 

m = f(k), vk g n. 

Moreover, the probability that exact recovery occurs exceeds 1 — 0(N~ p / a ); p > is here 
a universal constant and it is worth noting that the aforementioned reference gave explicit 
values for this constant. The implied constant in the 0() notation is allowed to depend on 
a, but is independent of N. In short, exact recovery may be achieved by solving a simple 
convex optimization problem — in fact, a linear program for real- valued signals — which is 
a result of practical significance. 

In a following paper, Candes and Romberg ^3] extended these results and showed that 
exact reconstruction phenomena hold for other synthesis/measurement pairs. For clarity of 
presentation, it will be convenient to introduce some notations that we will use throughout 
the remainder of the paper. We let Fq denote the |fi| by N matrix which specifies the set 
of those |f2| linear functionals which describe the measurement process so that the available 
information y about / is of the form 

y = F n f. 

For instance, in our previous example, Fq is the |Q| by N partial Fourier matrix whose 
rows are the sampled sinusoids 

F n (k, t) -.= -L e - i2 * ht / N , ken,teZ N . 
v N 

More generally, suppose that one is given an orthonormal basis \l/ 

* = (lpk(t))o<t,k<N, 

and that one has available partial information about / in the sense that we have knowledge 
about a randomly selected set O C {0, . . . , N — 1} of coefficients in basis \P. For convenience, 
define ^ to be the N by N synthesis matrix with entries ^f(t, k) := ipk(t)- Then is now 
obtained from ^* by extracting the |Q| rows with indices k obeying k € O. Suppose as 
before that there is another (fixed) orthonormal basis $ in which the coefficients 6(f) = 
(6t(f))i<t<N of / in this basis, defined by 

6t(f) :=</,&>, 

are sparse in the sense that only few of the entries of 6(f) are nonzero. Then it was shown 
in |13j that with overwhelming probability, / is the solution to 

min 110(9)11^ subject to F n g = F n f. (1.6) 

9 
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That is, exact reconstruction still occurs; the relationship here between the number of 
nonzero terms in the basis <3? and the number of observed coefficients |0| depends upon 
the incoherence between the two bases. The more incoherent, the fewer coefficients needed; 
in the other direction, in the maximally coherent case, e.g. \& = $, one in fact needs to 
sample essentially all of the coefficients in order to ensure exact reconstruction (the same 
holds true if and ^ share only one element with nonzero inner product with /). 

A special instance of these results concerns the case where the set of measurements is 
generated completely at random; that is, we sample a random orthonormal basis of WL N 
and observe only the first K coefficients in that basis (note that there is no advantage in 
randomizing f2 as in Section [l. II since the basis is already completely random). As before, 
we let Fq be the submatrix enumerating those sampled vectors and solve (jl.6|) . Then a 
consequence of the methodology developed in this paper is that exact reconstruction occurs 
with probability at least 1 — 0(N~ p l a ) (for a different value of p) provided that 

||0(/)lk<a-(*7logiV), (1-7) 

where a > is sufficiently small, and the io-noim is of course the size of the support of the 
vector 9 

\\9\U :=|{t:0 t ^O}|, 

see |23j for sharper results. In summary, l\ seems to recover sparse unknown signals in a 
variety of different situations. The number of measurements simply needs to exceed the 
number of unknown nonzero coefficients by a proper amount. 

Observe that a nice feature of the random basis discussed above is its statistical invariance 
by rotation. Let <I> be any basis so that 8(f) are the coefficients of / in that basis: 9(f) := 
<]?*/. The constraints in Q1.6JI impose 

F n <$> 9(g) = F Q <t> 8(f) 

and since the distribution of Fq& is that of Fq, the choice of the basis <J> is actually 
irrelevant. Exact reconstruction occurs (with overwhelming probability) when the signal is 
sparse in any fixed basis; of course, the recovery algorithm requires knowledge of this basis. 

1.2 Power laws 

In general, signals of practical interest may not be supported in space or in a transform 
domain on a set of relatively small size. Instead, the coefficients of elements taken from a 
signal class decay rapidly, typically like a power law |2()[ I22j. We now give two examples 
leaving mathematical rigor aside in the hope of being more concise. 

• Smooth signals. It is well-known that if a continuous-time object has s bounded 
derivatives, then the nth largest entry of the wavelet or Fourier coefficient sequence is 
of size about l/n s+1//2 in one dimension and more generally, l/n s l d+l / 2 in d dimensions 
|22| . Hence, the decay of Fourier or wavelet coefficients of smooth signals exhibits a 
power law. 

• Signals with bounded variations. A popular model for signal/ analysis is the space of 
objects with bounded variations. At the level of the continuum, the total- variation 
norm of an object is approximately the t\ norm of its gradient. In addition, there 
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are obvious discrete analogs for finite signals where the gradient is replaced by finite 
differences. Now a norm which is almost equivalent to the total- variation norm is 
the weak-£i norm in the wavelet domain; that is, the reordered wavelet coefficients of 
a compactly supported object / approximately decay like 1/n ^HJ. At the discrete 
level, H/Hbv essentially behaves like the ^i-norm of the Haar wavelet coefficients up 
to a multiplicative factor of at most log N. Moreover, it is interesting to note that 
studies show that the empirical wavelet coefficients of photographs of natural scenes 
actually exhibit the 1/n-decay [IT] . 

In fact, finding representations with rapidly decaying coefficients is a very active area of 
research known as Computational Harmonic Analysis and there are of course many other 
such examples. For instance, certain classes of oscillatory signals have rapidly decaying 
Gabor coefficients |29| . certain types of images with discontinuities along edges have rapidly 
decaying curvelet coefficients ^U] and so on. 

Whereas ^JJ considered signals / of small support, we now consider objects whose coeffi- 
cients in some basis decay like a power-law. We fix an orthonormal basis = (cf>t)i<t<N 
(which we call the reference basis), and rearrange the entries 6t(f) '■= (f,<fit) of the coeffi- 
cient vector 9(f) in decreasing order of magnitude |#|(i) > \d\(2) > ■ ■ ■ > \&\(n)- We say that 
9(f) belongs to the weak-l p ball or radius R (and we will sometimes write / 6 w£ p (R)) for 
some < p < oo and C > if for each 1 < n < N, 

\9\ {n) <R-n^. (1.8) 

In other words, p controls the speed of the decay: the smaller p, the faster the decay. The 
condition (|1.8j) is also equivalent to the estimate 

\{teZ N :\9 t (f)\>\}\<^ 

holding for all A > 0. We shall focus primarily on the case < p < 1. 

It is well-known that the decay rate of the coefficients of / is linked to the 'compressibility' 
of /, compare the widespread use of transform coders in the area of lossy signal or image 
compression. Suppose for instance that all the coefficients (9t(f))i< n <N are known and 
consider the partial reconstruction 9x(f) (where 1 < K < N is fixed) obtained by keeping 
the K largest entries of the vector 9(f) (and setting the others to zero). Then it immediately 
follows from 1)1.8(1 that the approximation error obeys 

W(f) - K (f)\\i 2 <C P -R- K~\ r := 1/p - 1/2, 

for some constant C p which only depends on p. And thus, it follows from Parseval that the 
approximate signal fx obtained by keeping the largest coefficients in the expansion of / in 
the reference basis $ obeys the same estimate, namely, 

<C P -R-K~ r , (1.9) 

where C p only depends on p. 

1.3 Recovery of objects with power-law decay 

We now return to the setup we discussed earlier, where we select K orthonormal vectors 
ipi , . . . , i[)k in M> N uniformly at random. Since applying a fixed orthonormal transform does 
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not change the problem, we may just as well assume that $ is the identity and solve 

(Pi) min subject to F n g = F n f, (1.10) 

g£R N 

where as usual, Fnf = ((f,ipk))ke£l- I n the setting where / does not have small support, 
we do not expect the recovery procedure (|1.10j) to recover / exactly, but our first main 
theorem asserts that it will recover / approximately. 

Note. From now on and for ease of exposition, we will take (Pi) as our abstract recovery 
procedure where it is understood that / is the sparse object of interest to be recovered; 
that is, / could be a signal in W N or its coefficients in some fixed basis <£. 

Theorem 1.1 (Optimal recovery of w£ p from random measurements) Suppose that 
f G l w obeys Q1.8|) for some fixed < p < 1 or \\f\\e 1 < R for p = 1, and let a > be 
a sufficiently small number (less than an absolute constant). Assume that we are given 
K random measurements Fnf as described above. Then with probability 1, the minimizer 
/" to (jl,10|) is unique. Furthermore, with probability at least 1 — 0(N~ p / a ) , we have the 
approximation 

\\f-f\ 2 < C p , a - R-(K/ log N)~ r , r = l/p-l/2. (1.11) 

Here, C PjQ , is a fixed constant depending on p and a but not on anything else. The implicit 
constant in 0(N~ p / a ) is allowed to depend on a. 

The result of this theorem may seem surprising. Indeed, Ijl.llJI says that if one makes 
0(K log N) random measurements of a signal /, and then reconstructs an approximate 
signal from this limited set of measurements in a manner which requires no prior knowledge 
or assumptions on the signal (other than it perhaps obeys some sort of power law decay 
with unknown parameters) one still obtains a reconstruction error which is equally as good 
as that one would obtain by knowing everything about f and selecting the K largest entries 
of the coefficient vector 9(f); thus the amount of "oversampling" incurred by this random 
measurement procedure compared to the optimal sampling for this level of error is only 
a multiplicative factor of O(logiV). To avoid any ambiguity, when we say that no prior 
knowledge or information is required about the signal, we mean that the reconstruction 
algorithm does not depend upon unknown quantities such as p or R. 

Below, we will argue that we cannot, in general, design a set of K measurements that would 
allow essentially better reconstruction errors by any method, no matter how intractable. As 
we will see later, Theorem 1 1.1 1 is a special case of Theorem 1 1 . 41 b elow (but for the uniqueness 
claim which is proved in Section |2J). 

1.4 Precedents 

A natural question is whether the number of random samples we identified in Theorem II .41 
is, in some sense, optimal. Or would it be possible to obtain similar accuracies with far 
fewer observations? To make things concrete, suppose we are interested in the recovery of 
objects with bounded £i-norm, e.g. the £i-ball 

Bi :={/:||/k<l}. 
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Suppose we can make K linear measurements about / € B\ of the form y = F^f. Then 
what is the best measurement /reconstruction pair so that the error 



E K (Bx) = sup ||/ - D(y)\\ 2 , y = F n f, (1.12) 
is minimum? In (|1,12|) . D is the reconstruction algorithm. 

To develop an insight about the intrinsic difficulty of our problem, consider the following 
geometric picture. Suppose we take K measurements Fnf; this says that / belongs to an 
affine space fo + S where S is a linear subspace of co-dimension less or equal to K. Now the 
data available for the problem cannot distinguish any object belonging to that plane. Assume 
/ is known to belong to the £i-ball Si, say, then the data cannot distinguish between any 
two points in the intersection B\ n /o + S. Therefore, any reconstruction procedure /* (y) 
based upon y = Fnf would obey 

S u Pl i/-rii> diam( f in5) . (i.i3) 

(When we take the supremum over all /, we may just assume that / be orthogonal to the 
measurements (y = 0) since the diameter will of course be maximal in that case.) The goal 
is then to find S such that the above diameter be minimal. This connects with the agenda 
of approximation theory where this problem is known as finding the Gelfand n-width of the 
class B\ (3^1) as we explain below. 

The Gelfand numbers of a set T are defined as 

d K {T) = inf{sup \\P s f\\ : codim(S) < K}, (1.14) 
s fer 

where P$ is, of course, the orthonormal projection on the subspace S. Then it turns out 
that dx(F) < Ek(F) < dxi^F)- Now a seminal result of Kashin [37] and improved by 
Garnaev and Gluskin |311 [3] shows that for the t\ ball, the Gelfand numbers obey 



where C, C are universal constants. Gelfand numbers are also approximately known for 
weak-lp balls as well. 

Viewed differently, Kashin, Garnaev and Gluskin assert that with K measurements, the 
minimal reconstruction error (|1,12|) one can hope for is bounded below by a constant times 
(Kj \og{N/K))~ 1 / 2 . In this sense, Theorem 11.41 is optimal (within a multiplicative con- 
stant) at least for K x N@, with (5 < l 2 . Kashin also shows that if we take a random 
projection, diam(£>i n S is bounded above by the right-hand side of (|1.15j) . We would also 
like to emphasize that similar types of recovery have also been known to be possible in the 
literature of theoretical computer science, at least in principle, for certain types of random 
measurements On the one hand, finding the Chebyshev center of diam(i3i Pi S) is a 
convex problem, which would yield a near-optimal reconstruction algorithm. On the other 
hand, this problem is computationally intractable when p < 1. Further, one would need to 
know p and the radius of the weak-£ p ball which is not realistic in practical applications. 

2 Note added in proof: since submission of this paper, we proved in |12| that Theorem 11.41 holds with 
log(N/K) instead of logiV in itTTTll . 
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The novelty here is that the information about / can be retrieved from those random 
coefficients by minimizing a simple linear program (jl.lOj) . and that the decoding algorithm 
adapts automatically to the weak-£ p signal class, without knowledge thereof. Minimizing 
the ^i-norm gives nearly the best possible reconstruction error simultaneously over a wide 
range of sparse classes of signals; no information about p and the radius R are required. In 
addition and as we will see next, another novelty is the general nature of the measurement 
ensemble. 

It should also be mentioned that when the measurement ensemble consists of Fourier co- 
efficients on a random arithmetic progression, a very fast recovery algorithm that gives 
near-optimal results for arbitrary £2 data has recently been given in |33| . Since the prepa- 
ration of this manuscript, we have learnt that results closely related to those in this paper 
have appeared in [21] . We compare our results with both these works in Section f9. 21 

1.5 Other Measurement Ensembles 

Underlying our results is a powerful machinery essentially relying on properties of random 
matrices which gives us very precise tools allowing to quantify how much of a signal one can 
reconstruct from random measurements. In fact, Theorem 11.11 holds for other measurement 
ensembles. For simplicity, we shall consider three types of measured data: 

• The Gaussian ensemble: Here, we suppose that 1 < K < N and := {1,... , K} 
are fixed, and the entries of Fq are identically and independently sampled from a 
standard normal distribution 



F Q (k,t) := 




X kt i.i.d. N(0,1). 



The Gaussian ensemble is invariant by rotation since for any fixed orthonormal matrix 
the distribution of Fq is that of Fq$. 

• The binary ensemble: Again we take 1 < K < N and f2 := {1, . . . , K} to be fixed. 
But now we suppose that the entries of Fq are identically and independently sampled 
from a symmetric Bernoulli distribution 

Fn(M) ■= -7=**t, x kt i-i.d. P(X kt = ±1) = 1/2. 
V N 

• The Fourier ensemble: This ensemble was discussed earlier, and is obtained by 
randomly sampling rows from the orthonormal N by iV Fourier matrix F(k, t) = 
exp(—i2irkt/N)/y/~N. Formally, we let < r < 1 be a fixed parameter, and then let 
be the random set defined by 

n = {k ■ i k = 1}, 

where the Jfc's are i.i.d. Bernoulli variables with P(Ife = 1) = r. We then let Rq : 
£2(2^) — > ^2(^2) be the restriction map (Rng)(k) = g(k) for all k £ (so that the 
adjoint Rq : £2(^1) — > ^(^jv) is the embedding obtained by extending by zero outside 
of f2), and set 

F n := R n F. 

In this case, the role of K is played by the quantity K := E(|f2|) = tN . (In fact |0| 
is usually very close to K; see Lemma l6.6jl . 
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Just as Theorem 11.11 suggests, this paper will show that it is possible to derive recovery 
rates for all three measurement ensembles. The ability to recover a signal / from partial 
random measurements depends on key properties of those measurement ensembles that we 
now discuss. 

1.6 Axiomatization 

We shall now unify the treatment of all these ensembles by considering an abstract measure- 
ment matrix Fq, which is a random |0| X N matrix following some probability distribution 
(e.g. the Gaussian, Bernoulli, or Fourier ensembles). We also allow the number of measure- 
ments |Q| to be a random variable taking values between 1 and N, and set K := E(|f2|) — 
the expected number of measurements. For ease of exposition we shall restrict our attention 
to real- valued matrices Fq; the modifications required to cover complex matrices such as 
those given by the Fourier ensemble are simple. We remark that we do not assume that 
the rows of the matrix Fq form an orthogonal family. 

This section introduces two key properties on Fq which — if satisfied — will guarantee that 
the solution to the problem (jl.K)jl will be a good approximation to the unknown signal / 
in the sense of Theorem ll.il 

First, as in our arguments rely, in part, on the quantitative behavior of the singular 
values of the matrices Fqt '■= FqR^ : 1%(T) — > £2(^1) which are the |0| by \T\ matrices 
obtained by extracting \T\ columns from Fq (corresponding to indices in a set T). More 
precisely, we shall need to assume the following hypothesis concerning the minimum and 
maximum eigenvalues of the square matrix Fq T Fqt ■ £2^) — * £2{T). 

Definition 1.2 (UUP: Uniform Uncertainty Principle) We say that a measurement 
matrix Fq obeys the uniform uncertainty principle with oversampling factor X if for ev- 
ery sufficiently small a > 0, the following statement is true with probability at least? 
1 — 0(N~ p l a ) for some fixed positive constant p > 0: for all subsets T such that 

\T\<a-K/X, (1.16) 

the matrix Fqt obeys the bounds 

IK 3 K 

2~ ~N — ^^{FqtFqt) < Xmax(FQ T FQ T ) < - • — . (1.17) 

Note that (|1.17|) is equivalent to the inequality 

~\\f\\l<\m\\h<l§u\\i (us) 

holding for all signals f with support size less or equal to aK/X. 

There is nothing special about the constants 1/2 and 3/2 in Q1.17|) . which we merely selected 
to make the UUP as concrete as possible. Apart from the size of certain numerical constants 
(in particular, implied constants in the 0() notation), nothing in our arguments depends 
on this special choice, and we could replace the pair (1/2,3/2) with a pair (a, b) where a 

3 Throughout this paper, we allow implicit constants in the 0() notation to depend on a. 
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and b are bounded away from zero and infinity. This remark is important to keep in mind 
when we will discuss the UUP for binary matrices. 

To understand the content of (|1.17j) . suppose that Fq is the partial Fourier transform and 
suppose we have a signal / supported on a set T obeying \T\ < aK/X. Then (|1.17|) says 
that ||/ 1|^ 2 (n) is & t most Ty^n£/2N\\f\\i 2 with overwhelming probability. Comparing this 
with Plancherel's identity ||/||£ 2 (z JV ) = 11/ life j we see * na ^ ( W1 th overwhelming probability) 
a sparse signal / cannot be concentrated in frequency on f2 regardless of the exact support of 
f, unless K is comparable to N . This justifies the terminology "Uncertainty Principle". A 
subtle but crucial point here is that, with overwhelming probability, we obtain the estimate 
(|1.17|) for all sets T obeying (jl.lnjk this is stronger than merely asserting that each set T 
obeying Q1.16JI obeys (|1.17|) separately with overwhelming probability, since in the latter 
case the number of sets T obeying (jl.lfijl is quite large and thus the union of all the 
exceptional probability events could thus also be quite large. This justifies the terminology 
"Uniform" . As we will see in Section |S1 the uniform uncertainty principle hypothesis is 
crucial to obtain estimates about the £2 distance between the reconstructed signal /" and 
the unknown signal /. 

The UUP is similar in spirit to several standard principles and results regarding random 
projection, such as the famous Johnson-Lindenstrauss lemma |35j regarding the preservation 
of distances between a finite number of points when randomly projected to a medium- 
dimensional space. There are however a number of notable features of the UUP that 
distinguish it from more standard properties of random projections. Firstly, there is a wide 
latitude in how to select the measurement ensemble ifo; for instance, the entries do not 
have to be independent or Gaussian, and it is even conceivable that interesting classes of 
completely deterministic matrices obeying the UUP could be constructed. Secondly, the 
estimate (|1.17JI has to hold for all subsets T of a certain size; for various reasons in our 
applications, it would not be enough to have (|1.17|) merely on an overwhelming proportion 
of such sets T. This makes it somewhat trickier for us to verify the UUP; in the Fourier 
case we shall be forced to use some entropy counting methods of Bourgain. 

We now introduce a second hypothesis (which appears implicitly in |11| . whose signif- 
icance is explained below. 

Definition 1.3 (ERP: Exact Reconstruction Principle) We say that a measurement 
matrix Fq obeys the exact reconstruction principle with oversampling factor A if for all 
sufficiently small a > 0, each fixed subset T obeying (|1.16|) and each 'sign' vector a defined 
on T, \a(t)\ = 1, there exists with overwhelmingly large probability a vector P € M. N with 
the following properties: 

(i) P(t) = a(t), for allte T; 

(ii) P is a linear combination of the rows of Fq (i.e. P = FqV for some vector V of 
length 

(in) and\P{t)\ <± for all t £ T c := {0, . . . , N — 1}\T. 

By 'overwhelmingly large', we mean that the probability be at least 1 — 0(N~ p l a ) for some 
fixed positive constant p > (recall that the implied constant is allowed to depend on a). 
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Section [2 will make clear that ERP is crucial to check that the reconstruction /" is close, 
in the £i-norm, to the vector obtained by truncating /, keeping only its largest entries. 
Note that, in contrast to the UUP, in ERP we allow a separate exceptional event of small 
probability for each set T, rather than having a uniform event of high probability that 
covers all T at once. There is nothing special about the factor 1/2 in (Hi); any quantity (5 
strictly between and 1 would suffice here. 

To understand how ERP relates to our problem, suppose that / is a signal supported on a 
set T. Then using duality theory, it was shown in that the solution to (jl.lUj) is exact if 
and only if there exist a P with the above properties for a(f) = sgn(/)(t) — hence the name. 

The hypotheses UUP and ERP are closely related. For instance, one can use UUP to 
prove a statement very similar to ERP, but in the £2 norm rather than the £oo norm; see 
Corollary 13. II below. One also has an implication of the form UUP ==> ERP for generic 
signals / assuming an additional weaker hypothesis WERP, see Section |SJ In |llj and in 
|13| . the property UUP was used (together with some additional arguments) to deduce 4 
ERP. 

We now are in position to state the main result of this paper. 

Theorem 1.4 Let Fq be a measurement process such that UUP and ERP hold with over- 
sampling factors Ai and A2 respectively. Put A = max(Ai, A2) and assume K > A. Suppose 
that f is a signal in M. N obeying (jl.Hj) for some fixed < p < 1 or \\f\\e 1 < R for p = 1, 
and let r := 1/p — 1/2. Then for any sufficiently small a, any minimizer p to (tnui) win 
obey 

\\f-f% 2 <C p , a -R-(K/\)~ r (1.19) 

with probability at least 1 — 0(N~ p ^ a ). The implied constant may depend on p and a but 
not on anything else. 

In this paper, we will show that the Gaussian and binary ensembles mentioned earlier obey 
UUP and ERP with A = log N, while the Fourier ensemble obeys UUP with A = (log N) 6 
and ERP with A = logA r . Hence given an object / E w£ p (R), we prove that if we collect 
K > logA^ Gaussian or binary measurements, then 

\\f - f% 2 <0{l) ■ R - {K/\ogNY r (1.20) 

except for a set of probability at most 0(N~ p / a ). For randomly sampled frequency data 
(with at least (log./V) 6 frequencies being sampled), the quality of the reconstruction now 
reads as 

11/ - f% 2 < 0(1) ■ R ■ (K/(logNfy r . (1.21) 

We prove this theorem in Section 13.21 Observe that our earlier Theorem 11.11 follows from 
(|1.2()|) and is thus a special case of Theorem ll.4l Indeed, for a fixed Fq, (jl,l()|) is equivalent 
to 

min ||%) ||^ subject to Png = P n f- 
9 

4 Note added in proof: In a sequel |12) to this paper, we show that a slight strengthening of the UUP 
(in which the constants | and | are replaced by other numerical constants closer to 1) in fact implies ERP 
unconditionally. 
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where Pq is the orthogonal projection onto the span of the rows of Fq. Now suppose as in 
the Gaussian ensemble that Fq is a matrix with i.i.d. N(0, 1/-/V) entries, then Pq is simply 
the projection onto a random plane of dimension K (with probability 1) which, of course, 
is the setup of Theorem ll.il 

1.7 About the l\ norm 

We would like to emphasize that the simple nonlinear reconstruction strategy which min- 
imizes the £i-norm subject to consistency with the measured observations is well-known 
in the literature of signal processing. For example in the mid-eighties, Santosa and Symes 
proposed this rule to reconstruct spike trains from incomplete data j35]> see also j^J. We 
would also like to point out connections with total-variation approaches in the literature 
of image processing [481 II 1 J which are methods based on the minimization of the £i-norm 
of the discrete gradient. Note that minimizing the ^i-norm is very different than standard 
least squares (i.e. t%) minimization procedures. With incomplete data, the least square 
approach would simply set to zero the 'unobserved' coefficients. Consider the Fourier case, 
for instance. The least-squares solution would set to zero all the unobserved frequencies 
so that the minimizer would have much smaller energy than the original signal. As is well 
known, the minimizer would also contain a lot of artifacts. 

More recently, ^i-minimization perhaps best known under the name of Basis Pursuit, has 
been proposed as a convex alternative to the combinatorial norm £q, which simply counts 
the number of nonzero entries in a vector, for synthesizing signals as sparse superpositions 
of waveforms [E3 . Interestingly, these methods provided great practical success ^1 E] and 
were shown to enjoy remarkable theoretical properties and to be closely related to various 
kinds of uncertainty principles [212 123 EU E] • 

On the practical side, an ^i-norm minimization problem (for real-valued signals) can be 
recast as a linear program (LP) 0]. For example, (|1.1U|) is equivalent to minimizing Y^t u (t) 
subject to Ffig = Fq/ and —u(t) < g(t) < u(t) for all t. This is interesting since there is a 
wide array of ever more effective computational strategies for solving LPs. 

1.8 Applications 

In many applications of practical interest, we often wish to reconstruct an object (a discrete 
signal, a discrete image and so on) from incomplete samples and it is natural to ask how 
much one can hope to recover. Actually, this work was motivated by the problem of recon- 
structing biomedical images from vastly undersampled Fourier data. Of special interest are 
problems in magnetic resonance (MR) angiography but it is expected that our methodology 
and algorithms will be suitable for other MR imagery, and to other acquisition techniques, 
such as tomography. In MR angiography, however, we observe few Fourier samples, and 
therefore if the images of interest are compressible in some transform domain such as in the 
wavelet domain for example, then ^i-based reconstructions might be especially well-suited. 

Another application of these ideas might be to view the measurement/reconstruction pro- 
cedure as a kind of lossy encoder/decoder pair where the measurement process would play 
the role of an encoder and the linear program (Pi) that of a decoder. We postpone this 
discussion to Sectional 
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1.9 Organization of the Paper 



This paper is roughly divided into three parts and is organized as follows. The first part 
(Sections |21 and |2J), shows how UUP together with ERP give our main result, namely, 
Theorem 11.41 In Section |21 we establish that the solution to (|1.1U|1 is in some sense stable 
in the £i-norm, while Section |3] introduces some ^-theory and proves our main result. In 
the second part (Sections 01 EJ HD and [7J), we show that all three measurement ensembles 
obey UUP and ERP. Section |I] studies singular values of random matrices and shows that 
the UUP holds for the Gaussian and binary ensembles. Section [5] presents a weaker ERP 
which, in practice, is far easier to check. In Section |SJ we prove that all three ensembles 
obey the ERP. In the case of the Fourier ensemble, the strategy for proving the UUP is 
very different than for Gaussian and binary measurements, and is presented in a separate 
Section [7| Finally, we will argue in the third part of the paper that one can think of the 
random measurement process as some kind of universal encoder (Section EJ) and briefly 
discuss some of its very special properties. We conclude with a discussion section (Section 
IHJ) whose main purpose is to outline further work and point out connections with the work 
of others. The Appendix provides proofs of technical lemmas. 

2 Stability in the ^-norm 

In this section, we establish ^-properties of any minimizer to the problem (-Pi), when the 
initial signal is mostly concentrated (in an i\ sense) on a small set. 

Lemma 2.1 Assume that the measurement matrix Fq obeys ERP. We let f be a fixed 
signal of the form f = /o + h where /o is a signal supported on a set T whose size obeys 
(|1.16|) . Then with probability at least 1 — 0(N~ p ' a ), any t\-minimizer (fTTOj) obeys 



Now because ERP holds, one can construct — with the required probability — a function 
P = F^V for some V G £ 2 (K) such that P = sgn(/ ) on T and \P(t)\ < 1/2 away from T. 
Observe the identity 



(/ > P) = (/», F^V) = (Fnf, V) = (Ffi(/o + h), V) = (/„ + h, FqV) = (/„ + h,P). 



\\P-i T 4h<M\h\\ e 



(2.1) 




(2.2) 



Then on the one hand 



(ft,P) = (f ,P) + (h,P)>\\MW-\\h\\ ei 



while on the other hand, the bounds on P give 



\(f\P)\ < 



< 
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To conclude, we established that 

ll/olk-11%! < ll^lk-^ll^iHk, 

and together with (|2.2j) proved that 

||/»l T c|k <4||% 15 

as claimed. ■ 

This lemma says that any minimizer is approximately concentrated on the same set as the 
signal /. Indeed, suppose that / obeys 1)1. 8|) and consider T to be the set of largest values 
of |/|. Set /o = / • It- Then the property (|1,8|) gives 

\\h\\ tl = ||/ •l T c|| 4 <C p -|T| 1 - 1 /f 

for some constant C p only depending on p, and therefore (j2.1j) gives 

||/«-l T c|k <4C p -|T| 1 - 1 /p (2.3) 

Thus, /" puts 'little mass' outside of the set T. 

Corollary 2.2 Let /" 6e any l\-minimizer to the problem (Pi) and rearrange the entries of 
/" in decreasing order of magnitude |/"|(i) > |/"|(2) > ••• > 1/ |(jv)- Under the hypotheses 
of Lemma \2.1\ the rath largest entry of /" o&eys 

|/V) < C p • ITI" 1 /? Vm>2|T|. (2.4) 

Proof Suppose T is the set of \T\ largest entries of / as above so that /" obeys (|2.3|h Denote 
by E m the set of the m-largest values of the function p. Obviously \E m n T c \ > m — \T\ 
and, therefore, 

\\fH h{Em w) >(m- \T\) • |/«| (m) > |T| • |/»| (m) . 
The claim then follows from 

M^nT^ll^lHk^-m 1 - 1 /?. 



3 Stability in the ^-norm 
3.1 Extension lemma 

As essentially observed in a matrix obeying (|1.17|) — think of it as a partial Fourier 
transform — allows to extend a function from a small set to all of Z^r while constraining its 
Fourier transform to a fixed random set: 

Corollary 3.1 (Extension theorem) Assume that Fq is a matrix obeying the uniform 
uncertainty principle UUP. Then with probability at least 1 — 0{N~ p / a ) the following 
statement holds: for all sets T C "Ln obeying the bound (|1.17j) and all functions f G ^(T), 
there exists f ext S ^(^at) which 



14 



• agrees with f on T (R,Tf ey± = f), 

• belongs to the column space of (i.e. / ext = F^V for some V £ H.2^1)), 

• and furthermore, we have the £2 estimates 

\\r% 2{E) <c{i + ^j' 2 \\f\\ HT) (3.1) 

valid for all E C Zjv • 

Proof We may assume that we are on an event such that the conclusions of UUP hold- 
In particular, from (|1.17j) . the operator (Fq T Fqt) is invertible and the inverse obeys 

\\(F£ T F nT )- 1 \\<2N/K, (3.2) 

where || • || is the operator norm induced by the I2 norm. In the remainder of this 
paper and unless specified otherwise ||A|| will always be the operator norm of \\A\\ := 
sup|| x || ( , 2=1 ||Ae||^ 2 . We now set / cxt as 

r xt :=i^h T (F^Fnr)- 1 f. 

By construction, / ext agrees with / on T, and is in the column space of F^. Now we 
prove (|3.1|h It suffices to do so when \E\ < aK/X, since the general claim then follows by 
decomposing larger £"s into smaller sets and then square-summing. But from (|1.17[) . we 
see that Fqt an d Fq E have operator norms of size at most ^J2>K/2N , and the claim follows 
by composing these facts with ()3.2j) . ■ 

3.2 Proof of Theorem [Ql 

Let To (resp. T\) be the set of the 5-largest values of |/| (resp. |/"|) and put T = Tq UT±. 
By construction, S < \T\ < 2S and we assume that |T| obeys the condition (jl,16|h Now 
observe that by construction of T, a consequence of Lemma 12. II is that 

11/ " f%( T c) < \\f\\i l{ T § ) + \\f%(T§) < C p ■ \T\ 1 -Vp. (3.3) 
Furthermore, it follows from our assumption about / and (|2.4|) that 

11/ - f%^) < Wfh^TS) + ll/ tt ll^(T f) < C ■ |T|-Vp. (3.4) 
By interpolation, these last two inequalities give 

Wf-fHuT^^C-lTl 1 / 2 - 1 ^, (3.5) 
and it remains to prove that the same bound holds over the set T. 

In order to prove this fact, Corollary 13 . II assures us that one can find a function of the form 
g = FqV which matches h on T and with the following property: 

J>(i)| 2 <C£|/(t)-/»(t)| 2 , (3.6) 
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for all sets E of cardinality 0(K/X) that are disjoint from T. Here and in the rest of the 
proof, the constants C are allowed to depend on a. From the representation g = F^V and 
the constraint F n f = Fnf 6 (from (TTTUjn . we have 

(/ - fK 9) = (f~ fK F&V) = (F n f - FnfK V) = 0, 

and hence 

^ (f - f)(t)W) = o- 

t&z N 

Splitting into T and T c , we obtain 



Ei/-/*i 2 (*) = -E(/-/ tt x*M*) 

teT t<=T c 



(3.7) 



We will use ()3.7j) to show that the left-hand side must be small since (|3.5j) and ()3.6)) assert 
that the right-hand side is not very large. Enumerate T c as n±, 112, ■ ■ ■ , nj^_\x\ in decreasing 
order of magnitude of \f — f^\. We then group these into adjacent blocks Bj of size |T| 
(except perhaps for the last one) Bj := {rij, J\T\ < j < (J + 1)|T|}, J = 0, 1, . . .. From 
()3.6|) and Cauchy-Schwarz, we have 



E(/-/ tt )KM%) 

jeBj 



<C- 



\e 2 {T) ■ !j> 



(3. 



where 



I J ■-- 



(J+1)\T\ 

v E \(f-f*)(nW 

\ j=J\T\+l 



Because we are enumerating the values of rij in decreasing order, we have/o < \T\ l ' 2 -\f- 
f\(ni)<C ■ \T\ 1 / 2 ~ 1 /p while for J > 1 we have 

h < |T|V2 . |/ _ /»|(n jm+1 ) < \T\W • m- 1 • ||/ - f% l{Bj _ l} , 

In other words, 

£//</o + £/j<C |T|" r + |T|-V2 . ||/ _ /S|| MTC) 
J J>1 

and, therefore, it follows from (|3.3|) that the summation of the inequality (|3.8|) over the 
blocks Bj gives 



£(/-/»)(*)*(*) 



teT c 



< C ■ \T\~ r ■ \\f - p\\ 



Inserting this back into 1)3. 7|) . we established 

\\f-f% 2{T )<C-\T\- r . 
This concludes the proof of Theorem II .41 | 

Note that by Cauchy-Schwarz, it follows from the proof of our Theorem that 

\\f-f\ l{ T)<C-\T\ 1 - 1 /^ 



16 



and, therefore, owing to 1)3,3)1 . we also proved an l\ stability estimate 

||/-/ fl lk <C-\T\ x ~ x lv. (3.9) 

Had we assumed that / belonged to the weak-^i ball when p = 1, the right-hand side of 
1)3.3)1 would read C\ log(iV/|T|) instead of just C\. This is the reason why we required l\ in 
the hypothesis of Theorem 11.41 and showed that we also have a near-optimal signal recovery 
result for the unit ball of l\ with no additional losses (logarithmic or otherwise). 

3.3 Uniqueness of the minimizer for the Gaussian ensemble 

The claim that the minimizer /" is unique with probability 1, for Gaussian measurements, 
can be easily established as follows. The claim is trivial for / = so we may assume / is 
not identically zero. Then FqJ is almost surely non-zero. Furthermore, if one considers 
each of the (finitely many) facets of the unit ball of ^i(Zjv), we see that with probability 1 
the random Gaussian matrix Fq has maximal rank on each of these facets (i.e. the image 
of each facet under Fq has dimension equal to either K or the dimension of the facet, 
whichever is smaller). From this we see that every point on the boundary of the image of 
the unit ^i-ball under Fq, arises from a unique point on that ball. Similarly for non-zero 
dilates of this ball. Thus the solution to the problem 1)1.10(1 is unique as claimed. 

We remark that the question of establishing uniqueness with high probability for discretely 
randomized ensembles such as the binary and Fourier ensembles discussed below is an 
interesting one, but one which we will not pursue here. 

4 Eigenvalues of random matrices 

In this section, we show that all three ensembles obey the uniform uncertainty principle 
UUP. 

4.1 The Gaussian ensemble 

Let X be an n by p matrix with p < n and with i.i.d. entries sampled from the normal 
distribution with mean zero and variance 1/n. We are interested in the singular values of X 
or the eigenvalues of X*X. A famous result due to Marchenko and Pastur |42j states that 
the eigenvalues of X*X have a deterministic limit distribution supported by the interval 
[(1 — \fc) 2 , (1 + \/c) 2 ] as n,p — ► oo, with p/n — > c < 1. In fact, results from [STj show 
that the smallest (resp. largest) eigenvalue converges a.s. to (1 - Vc) 2 (resp. (1 + Vc) 2 )- In 
other words, the smallest singular value of X/y/n converges a.s. to 1 — \fc and the largest 
to 1 + \fc. In addition, there are remarkably fine statements concerning the speed of the 
convergence of the largest singular value |36j . 

To derive the UUP, we need a result about the concentration of the extreme singular values 
of a Gaussian matrix, and we borrow a most elegant estimate due to Davidson and Szarek 
|52| . We let Ai(X) < . . . < X p (X) be the ordered list of the singular values of X. Then in 
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[52] . the authors prove that 



P(A p (I)>l + v^ + r)<e- nr / 2 , (4.1) 
p(Xi(X) < 1- ^/pjn~-r) <e- nr2/2 . (4.2) 



Such inequalities about the concentration of the largest and smallest singular values of 
Gaussian matrices have been known for at least a decade or so. Estimates similar to Q4.1JI - 
Q4.2|) may be found in the work of Szarek ^S], see also Ledoux |39| . 

Lemma 4.1 The Gaussian ensemble obeys the uniform uncertainty principle (UUP ) with 
oversampling factor A = logiV. 

Proof Fix K > log N and let Q := {1, . . . , K}. Let T be a fixed index set and define the 
event Et as 

E T := {Knn(F^ T F nT ) < K/2N} U {A^F^Fqt) > 3K/2N}. 

Since the entries of Fqt are i.i. d. N(0, 1/N), it follows from (gjJ-fOl) by a simple renor- 
malization that for each \T\ < K/16, 



P(E T ) < 2e 



-cK 



where one can choose c = 1/32 by selecting r = 1/4 in (|4.1|) - (|4.2|) . We now examine the 
tightness of the spectrum over all sets T £ T m := {T : |T| < m} where we assume that m 
is less than N/2. We have 

P (U Tm E T ) < 2e-" ■ \T m \ = 2e-*« • f) < 2e~<* ■ m Q . 



< e NH(m/N) 



k=l 

We now use the well-known bound on the binomial coefficient 

where for < q < 1, H is the binary entropy function 

H(q) := -qlogq- (1 - <?)log(l - q). 

The inequality —(1 — q) log(l — q) < q shows that — (1 — m/N) log(l — m/N) < m/N and 
thus 

m ( \ — e mlog(A r /m)+m+log»n 

\m) 

Whence, 

log P (Ur m E T ) < log 2 - cK + m (log (N/m) + 1 + mT 1 log m) < log 2 - pK 

provided that m(log(-/V/m) + 1 + m _1 logm) < (c — p)K, which is what we needed to 
establish. (We need to assume that K > (c — /)) -1 (l + logiV) for the claim not to be 
vacuous.) Note that we proved more than what we claimed since the UUP holds for an 
oversampling factor proportional to log N/K. ■ 
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4.2 The binary ensemble 



The analysis is more complicated in the case where the matrix X is an n by p array 
with i.i.d. symmetric Bernoulli entries taking on values in {—1/y/n, 1/y/n}. To study the 
concentration of the largest singular values of X, we follow an approach proposed by Ledoux 
|39| which makes a simple use of the concentration property, see also |28| . 

As before, we let X P (X) be the mapping that associates to a matrix X its largest singular 
values. Equip R np with the Frobenius norm 

v 

\\Xf F :=Tr(X*X) = ^ \Xij\ 2 
i,j=l 

(the Euclidean norm over R np ). Then the mapping X p is convex and 1-Lipschitz in the 
sense that 

\X P (X) - X P (X')\ < \\X-X'\\ F 

for all pairs (X, X') of n by p matrices. A classical application of the concentration inequal- 
ity for binary measures |39| then gives 

P (X P (X) - m(X p (X)) >r)< e~ nr 2 / 16 , (4.3) 

m(X p (X)) is either the mean or the median of X P (X). Now the singular values still exhibit 
the same behavior; that is X m i n (X/y/n) and \ m ax.(X / i/n) converge a.s. to 1 — \J~c~ and 
1 + \fc respectively, as n,p — ► oo with p/n — > c j2;. As a consequence, for each eo and n 
sufficiently large, one can show that the medians belong to the fixed interval [1 — \Jpjn — 
eo, 1 + yp/n + eo] which gives 

P (X P {X) > 1 + ^p~J7i + e + r) < e — Vie. (44 ) 
This is a fairly well-established result |28| . 

The problem is that this method does not apply to the minimum singular value which is 
1-Lipshitz but not convex. Fortunately, Litvak, Pajor, Rudelson and Tomczak-Jaegermann 
40 [Theorem 3.1] have recently announced a result which gives exponential concentration 
for the lowest singular value. They proved that whenever n > (1 + S)p where 5 is greater 
than a small constant, 

P(A 1 (A)>c 1 )<e- C2n , (4.5) 
where c\ and c<i are universal positive constants. 

Just as (j4.1|) - (|4.2j) implied the uniform uncertainty principle UUP for Gaussian matrices, 
(|4.4j) - (|4,4[) gives the same conclusion for the binary ensemble with the proviso that the 
condition about the lowest singular value reads X m i n (F^ T F^T) > c\K/N\ i.e., c\ substitutes 
1/2 (recall the remark following the definition of the UUP). 

Lemma 4.2 The binary ensemble obeys the uniform uncertainty principle (UUPj with 
oversampling factor X = log N. 

The proof is of course identical to that of Lemma 14.11 If we define Et as 

E T ■= {X mhl (F^ T F nT ) < Cl K/N} U {X nmx (F^ T F nT ) > 3K/2N}, 
we have P(Et) < 2e~ cK for some constant c > 0. The rest of the proof is as before. 



19 



4.3 The Fourier ensemble 



The analysis for the Fourier ensemble is much more delicate than that for the Gaussian and 
binary cases, in particular requiring entropy arguments as used for instance by Bourgain 
jl]], [7]. We prove the following lemma in the separate Section Q 

Lemma 4.3 The Fourier ensemble obeys the uniform uncertainty principle UUP with 
oversampling factor A = (logiV) 6 . 

The exponent of 6 can almost certainly be lowered 5 , but we will not attempt to seek the 
optimal exponent of log N here. 

5 Generic signals and the weak ERP 

In some cases, it might be difficult to prove that the exact reconstruction principle ERP 
holds, and it is interesting to observe that UUP actually implies ERP for 'generic' sign 
functions a = ±1 supported on a small set T. More precisely, if we fix T and define a to 
be supported on T with the i.i.d. Bernoulli distribution (independently of Fq), thus 

P(<x(i) = ±1) = 1/2, for all ieT. 

then we shall construct a P obeying the conditions (i)-(iii) in the definition of ERP. Indeed, 
we shall construct P explicitly as 

P := F^FnTiF^FnT^RTO-; (5.1) 

one can view this choice of P = F^V as the unique solution to (i) and (ii) which minimizes 
the £2 norm of V, and can thus be viewed as a kind of least-squares extension of a using 
the rows of F^. 

It is immediate to check that P obeys (i) and (ii) above. Indeed, the restriction of P to T 
is given by 

R T P = R T F^F qt {F^ t F qt )~ 1 Rto = F^ T F nT {F^ T F nT )- l R T a = R T a 

and, therefore, (i) is verified. Further, it follows from the definition that P is a linear 
combination of the columns of F<^ and thus, (ii) holds. Therefore, we only need to check 
that for all t S T c , \P(t)\ < \ with sufficiently high probability. In order to do this, we 
rewrite P(t) as 

P(t) = (W t ,R T a), 

where for each t G T c , Wt is the |T| dimensional vector 

W t := (F^F^t)- 1 F* nT F t 

and Ft is the i-th column of Fq. We now introduce another condition which is far easier to 
check than ERP. 

5 Note added in proof: since the submission of this paper, Rudelson and Vershynin, in a very recent piece 
of work |47| . have improved the oversampling factor to (logiV) 4 . 
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WERP (Weak ERP). We say the the measurement process obeys the weak ERP, if for 
each fixed T obeying (|1.16|) and any < 7 < 1, Fq obeys 

\\F& r F t \\ la <'yy/KW\/N forallteT c (5.2) 

with probability at least 1 — 0(iV~ p / 7 ) for some fixed positive constant p > 0. 

For example, it is an easy exercise in large deviation theory to show that WERP holds for 
Gaussian and binary measurements. One can also check that WERP holds for random 
frequency samples. We omit the proof of these facts, however, since we will show the 
stronger version, namely, ERP in all three cases. Instead, we would like to emphasize that 
UUP together with WERP actually imply ERP for most sign patterns a. 



We begin by recalling the classical Hoeffding inequality: let X±, . . . , Xn = ±1 be indepen- 

<?'= 



dent symmetric Bernoulli random variables and consider the sum S = ^2jL\ a jXj. Then 



P(|5|<A)<2exp^-^-j. (5.3) 

Suppose now that the <r(t)'s are independent Bernoulli, and independent from F. Then 
(|5.3|) gives 

A_ 2 

2p 2 

If we now assume that both UUP and WERP hold, then for any < 7 < 1 we have 



P (\P(t)\ > A I \\W t \\e 2 = p) < 2exp ( -—^ ) . 



\W t \\t 2 < \\{F^ T F nT )- l \\ ■ m T F t \U 2 < 2 7 • yj 



with probability at least 1 — 0(N p ' 7 ). This shows that 



P(|P(i)|>A) < P (JP(i)| > A I ||Wi||/ a < 2 7 • V|T|7^J +P (j|Wt||/ a > 27 • V\T\/K 

< 2exp(^-^^ +0{N- p ^) = 2e- poK/m + 0{N- p ^). 

Hence, if \T\ < aK/\ogN, then 

P (sup \P(t)\ > 1/2 ) < 0{N • N~ p °/ a ) + 0{N- p / a ). 
\teT c ) 

Therefore, if a is chosen small enough, then for some small p' > 

P (sup \P(t)\ > 1/2 ) = 0{N-p'/ a ). 

In other words, ERP holds for most sign patterns a. That is, if one is only interested 
in providing good reconstruction to nearly all signals (but not all) in the sense discussed 
above, then it is actually sufficient to check that both conditions UUP and WERP are 
valid. 



6 About the exact reconstruction principle 

In this section, we show that all the three ensembles obey the exact reconstruction principle 
ERP. 
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6.1 The Gaussian ensemble 



To show that there is function P obeying the conditions (i)-(iii) in the definition of ERP, 
we take an approach that resembles that of Section [51 and establish that P defined as in 
(EH), 

P := F^F nT (F^ T F QT )- 1 R T a 
obeys the three conditions (i)-(iii). 

We already argued that P obeys (i) and (ii). Put P c = Rt<=P to be the restriction of P to 
T c . We need to show that 

sup|P c (t)| < 1/2 

with high probability. Begin by factorizing P c as 

P c = F& T cV, where V := {F^F^t)- 1 R T o. 

The crucial observation is that the random matrix Fq Tc and the random variable V are 
independent since they are functions of disjoint sets of independent variables. 

Proposition 6.1 Conditional on V, the components of P c are i.i.d. Gaussian with 

P c {t) ~ N(0, \\V\\ 2 /N). 

Proof Suppose V is fixed. By definition, 

ViV ken 

and, therefore, it follows from the independence between the Xf-fS and V for each t £ T c 
that the conditional distribution of P c (t) is normal with mean and variance [| V \\^ 2 /N. The 
independence between the components of P° is a simple consequence of the independence 
between the columns of F. ■ 



Lemma 6.2 Let a > be sufficiently small, and suppose that \T\ is chosen as in (|1.16|) so 
that UUP holds. The components of P c (t) obey 



P(\P c {t)\ > A) < P (\Z\ > X ■ VK/6\T\J + 0{N- p/a ) 
where Z ~ N(0, 1) is a standard normal random variable. 

Proof Observe that 

\\v\\ e2 < \\f t \\ ■ IK^Ft)" 1 !! • \\R T o-\U 2 . 



On the event such that the conclusions of UUP holds, ||F T || < y / 3K/2N and also || (i^f i^r)" 1 1| < 
2N/K. Since ||i?T(r||^ 2 = \/|T|, this gives 



6N ■ \T\ 

V k < ' 



K 
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Therefore, 

P(|P c (t)| > A) < P (\P c (t)\ > A | ||F|| /a < y/6N\T\/K) +P f||F||^ 2 > y/6N\f\/K) . 



The first term is bounded by Proposition 16. II while the second is bounded via the uniform 
uncertainty principle UUP. This establishes the lemma. ■ 

The previous lemma showed that for \T\ < a ■ Kj log N, 

P ^sup \P c (t)\ > 1/2^ < N-P \ \Z\ > ^ +0{N- p / a ) < 2N 1 - l '^+0{N- p l a ). 

Therefore, if a is chosen small enough, then for some small p' > 



P ( sup \P c {t)\ > 1/2 ) = 0(N-p'^ 

v*GT c 



In short, we proved: 



Lemma 6.3 The Gaussian ensemble obeys the exact reconstruction principle ERP with 
oversampling factor A = log N. 

6.2 The binary ensemble 

The strategy in the case where the entries of F are independent Bernoulli variables is nearly 
identical and we only discuss the main differences. Define P and V as above; obviously, 
Fq TC and V are still independent. 

Proposition 6.4 Conditional on V , the components of P c are independent and obey 

P{\P c (t)\ > A | V) < 2exp ' 1 



2\\V\\ 2 
z \\ v \\e 2 



Proof The conditional independence of the components is as before. As far as the tail- 
bound is concerned, we observe that P c (t) is a weighted sum of independent Bernoulli 
variables and the claim follows from the Hoeffding inequality ()5.3|) . ■ 

The rest of the argument is as before. If \T\ is selected as in (|1.16|) such that UUP holds, 
one has 

P(\P c {t)\ > 1/2) < 2iV- 1 / 48a + 0{N- p ' a ). 
And, of course, identical calculations now give 

Lemma 6.5 The binary ensemble obeys the exact reconstruction principle ERP with over- 
sampling factor A = log N . 
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6.3 The Fourier ensemble 



It turns out that the exact reconstruction principle also holds for the Fourier ensemble 
although the argument is considerably more involved 11 . We do not reproduce the proof 
here but merely indicate the strategy for proving that P (defined as before) also obeys the 
desired bound on the complement of T with sufficiently high probability. We first remark 
that is concentrated around K. To see this, recall the Bernstein's inequality :5_ which 
states that if X\, . . . ,X m are independent random variables with mean-zero and obeying 
\Xj\ < c, then 



N 



where a 2 = YliL 1 Var(JQ). Specializing this inequality gives the following lemma which we 
shall need later in this paper. 

Lemma 6.6 Fix r 6 (0,1) and let 1^ £ {0,1} be an i.i.d. sequence of random variables 
obeying P(ifc = 1) = r. Let a 6 ^(Zjv) be arbitrary, and set a 2 := r(l — r)||a||| 2 . Then 

P(| Y, (A - > A ) < 4ex P (- 77 2~TTir~ii 75 7^ 1 ' 



feeZ 

Proof Letting 5 be the sum X^fceZjvC^ ~~ r ) a (^); the proof follows from (jdlj) by simply 
stating that P(\S\ > A) is bounded above by the sum P(|Si| > X/V2) + P{\S 2 \ > A/V2), 
where Si and S2 are the real and imaginary parts of S respectively. ■ 

Thus the bound on the quantity | YlkeZ ~~ r ) a (^)l exhibits a Gaussian-type behavior 
at small thresholds A, and an exponential-type behavior at large thresholds. 

Recall that K = E(|fi|) = Nr. Applying Lemma lOH with a = 1 (so a 2 = Nt(1 — r)), we 
have that P(K/2 < |0| < 2K) with probability 0{N~P/ a ) provided that K > Car 1 logiV, 
which we will assume as the claim is vacuous otherwise. In the sequel, we assume that we 
are on an event {K/2 < |0| < 2K}. 

Decompose Fq T Fqt as 

F£ T FnT= l -§(I-H), 

where H is the matrix defined by H (t, t') = Ylkett e l2jTk ( t ~ t '' > ift^t' and otherwise. 
We then expand the inverse as a truncated Neumann series 

(F^Fut)- 1 = ^L(l + H + ... + H n + E), 
where E is small remainder term. This allows to express P c as 

pc = W\' f ^ tcFqt ■i I + H + --- + Hn + E ), 

and one can derive bounds on each individual terms so that the sum obeys the desired 
property. By pursuing this strategy, the following claim was proved in |llj . 

Lemma 6.7 The Fourier ensemble obeys the exact reconstruction principle ERP with 
oversampling factor A = logiV. 
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7 Uniform Uncertainty Principles for the Fourier Ensemble 



In this section, we prove Lemma 14.31 The ideas here are inspired by an entropy argument 
sketched in [7], as well as by related arguments in [S], [13]. These methods have since 
become standard in the high-dimensional geometry literature, but we shall give a mostly 
self-contained presentation here. 

We remark that the arguments in this section (and those in the Appendix) do not use any 
algebraic properties of the Fourier transform other than the Plancherel identity and the 
fact that the maximum entry of the Fourier matrix is bounded by 1/y/N. Indeed a simple 
modification of the arguments we give below also gives the UUP for randomly sampled rows 
of orthonormal matrices, see also [0] and ^3] for further discussion of this issue. Suppose 
that supjj \ Uij\ < \x and let Un be the matrix obtained by randomly selected rows. Then 
the UUP holds for 

■n < c m 

fi 2 log 6 iV' 

In the case where one observes a few coefficients in the basis <E> when the signal is sparse in 
another basis ^, \i = V~N supj \ {<fii,tftj)\ is interpreted as the mutual coherence between <J> 
and f JUj. 

For sake of concreteness, we now return to the Fourier ensemble. Let us first set up what 
we are trying to prove. Fix a > 0, which we shall assume to be sufficiently small. We may 
take N to be large depending on a, as the claim is vacuous when N is bounded depending 
on a. If T is empty then the claim is trivial, so from we may assume that 

K = tN > C log 6 N (7.1) 

for some (possibly) large constant C. 

We need to prove (|1.17j) . By self-adjointness, it would suffice to show that with probability 
at least 1 - 0{N~P/ a ) 



1 

< -r 



2 



(F^ T F nT f,f) e2{T) -T\\f\\ 2 e2{T) 

for all / G &2(T) and all T obeying (|1.16|) . thus \T\ < m, where 

m := arN/ log 6 N. (7.2) 



For any fixed T and /, the above type of estimate can easily be established with high 
probability by standard tools such as Lemma 16.61 The main difficulty is that there are 
an exponentially large number of possible T to consider, and for each fixed T there is a 
|T|-dimensional family of / to consider. The strategy is to cover the set of all / of interest 
by various finite nets at several scales, obtain good bounds on the size of such nets, obtain 
large deviation estimates for the contribution caused by passing from one net to the net at 
the next scale, and sum using the union bound. 

We turn to the details. We can rewrite our goal as 



ken 
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whenever \T\ < m. From Parseval's identity, this is the same as asking that 



£ (/ fc -r)|/(fc)| s 
feeZjv 



1 



2 



whenever \T\ < m. Now let U m C ^(Zjv) denote the set 

:= \J{B HE ) :ECZ N , \E\ = m} = {/ G £ 2 (Zjv) : ||/|| 4(Zjv ) < 1, |supp(/)| < m}. 
Then the previous goal is equivalent to showing that 



sup 

f&Um 



£ (4-r)|/>)| 5 



1 



with probability at least 1 — 0(N p / a ) for some p > 0. In fact we shall obtain the stronger 
estimate 



sup 

feu m 



£ (4-r)|/»| 5 



fceZj- 



> -r) =0 (^exp ^-Ilog^iV 



(7.3) 



for some constant /3 > 0. 



It remains to prove (|7.3j) . The left-hand side of ()7.3() is the large deviation probability of 
a supremum of random sums over U m . This type of expression can be handled by entropy 
estimates on Urn? £is was done in jHj, [Tj, |43| . To follow their approach, we need some 
notation. For any / G ^2(Zat), we let / be its discrete Fourier transform (|1.3j) and define 
the X norm of / by 

\x-=VN- 



Intuitively, if / is a "generic" function bounded in ^(Zjv) we expect the X norm of / to 
be also be bounded (by standard large deviation estimates). We shall need this type of 
control in order to apply Lemma Idfil effectively. To formalize this intuition we shall need 
entropy estimates on U m in the X norm. Let Bx be the unit ball of X in ^(Zat). Thus for 
instance U m is contained inside the ball ypm- Bx, thanks to Cauchy-Schwarz. However we 
have much better entropy estimates available on U m in the X norm, which we now state. 



Definition 7.1 (Kolmogorov entropy) Let X be a (finite- dimensional) normed vector 
space with norm \\ ■ \\x, and let Bx '■= {x G X : \\x\\x < 1} be the unit ball of X . If 
U C X is any bounded non-empty subset of X and r > 0, we define the covering number 
N(U, Bx,r) G Z + to be the least integer N such that there exist elements x\, . . . ,xn G X 
such that the balls Xj + rBx = {x G X : \\x — Xj\\x < r}, 1 < j < N cover U , and the 
Kolmogorov entropy as 

£(U,Bx,r):=log 2 (N(U,B x ,r)). 



Proposition 7.2 We have 

£(U m , B x , r) < C ■ m log N ■ min(r" 2 log N, 1) (7.4) 

for all r > N~ 2 . 
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This proposition is essentially contained in jS] , [7] , [33] ; for sake of completeness we give a 
proof of the proposition in the Appendix. Let us assume this proposition for the moment 
and conclude the proof of Lemma 14.31 Set 

{J ,...,Ji} = {jGZ: A 2 <2^'<^}. 

and fix r = 2 3 in Lemma 17.21 By (j7.4|) one can find a finite subset Aj of U m of cardinality 

C 



\AA < exp 



1 + 2 2 i 



• m log 2 iV 



(7.5) 



such that for all / G U m , there exists fj G A/ such that ||/ — < 2 J . Let us fix such 
sets Aj. Then for any / G C/ m , we have the telescoping decomposition 

/ = /-oo + E fj+i-fr 

J <j<Jl 

where G A,- and /_oo = / — /j j here we have the convention that fj = and Aj = {0} 
if j > J\. By construction, \\fj - fj+i\\x < 2 J+2 , and ||/_oo|U - 2A r ~ 2 . We write 
9j := /i+i ~~ /i' thus H^jllx — 2 J+2 . Fix k and observe the crude estimates 

< = = 1 and |/_oo(fc)| < \\f-ao\\x/>/N < 2iV- 5 / 2 . 

It then follows from \a + b\ 2 < \a\ 2 + |5| 2 + 2|o||6| that 



E 9j(k) 
Jo<j<Ji 



+ 0(AT-5/2)_ 



Multiplying this by — r and summing, we obtain 



E (4-r)|/»| 2 




E (^- r ) 


E 


2 


+ o(iv- 3 / 2 








j <i<Ji 







<2 E Q(9j,9f) + 0(N- 3 / 2 ). 



J <j<j'<Ji 

where Q(gj,gj') is the nonnegative random variable 



Q(9j,9j') ■-- 



E (4-r)Re(^(/c)^(fc)) 



By (|7.1|1 . the error term 0(N 3 / 2 ) is less than r/20 (say) if A is sufficiently large. Thus 
to prove 1)7. 3j) it suffices to show that 



/ 



\ 



E SU P SUP Q(9j,9f) > T7T 

jo<i<j'< ji ?j 9 i' e ^" - Vu 

- Il3 j ||x<2^+ 2 || ff .,|| x <2/+2 



Ojrxpj --log 2 A 



(7.6) 



The main difficulty is of course the presence of the suprema. On the other hand, the fact 
that the functions gj,gj' are well controlled both in entropy and in X norm will allow us to 
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handle these suprema by relatively crude tools such as the union bound. By the pigeonhole 
principle, we can bound the left-hand side of Q7.6|) by 



/ 



E p 

Jo<j<j'<Ji 



sup sup Q(9j,9j')>- — 2~\r ' ~ 

gjeAj _ Aj+1 g . ieAjl -A jl+l log 1\ 

}\9i\\x<2>+ a || 9y || x <2/+2 



for some small absolute constant cq. Since the number of pairs is 0(log 2 N), which 

is much smaller than exp(^ log 2 N), it now suffices to show (after adjusting the value of j3 
slightly) that 



sup sup Q(gj,gf) > C ° - • - 



\||9il|jc<^'+ 2 || ffj ,|| x <2i'+a 
whenever Jq < j < j' < Ji. 

Fix j,/ as above. From (|7.5() the number of possible values of gy is at most exp(y c 
m log TV). Thus by the union bound it suffices to show that 

/ 



sup Q{gj,9j')> 

\||Sillx<2^+ 2 



co 



loe 2 iV 



r = O ( exp 



C 



1 + 2 2 i 



77 • m log N 



for each 6 Aji — A/'+i with ||<?j' ||x < 2 J ' +2 ; note that we can absorb the exp(— 4 log 2 JV) 
factor since 2 2 ^" < 777. . 

Fix gji. We could also apply the union bound to eliminate the supremum over gj, but it 
turns out that this will lead to inferior bounds at this stage, because of the poor control 
on gji . We have to first split the frequency domain Z jy into two sets Z^v = E\ U E% , where 



El :={kZ w : \g f (k)\ > 



Ctf? log 2 iV 



N 



} and £ 2 := {k £ Z N : \gf{k)\ < 



C 2 j log 2 N 



} 



for some large absolute constant Co- Note that these sets depend on gy but not on gj. It 
thus suffices to show that 



sup 

K g j <=A j -A j+1 :\\g j \\x<23+ 2 



Qi(9j,9f) > 



co 



log 2 iV 



■ t I =0 exp 



C 



1 + 2 2 i' 



m log iV 



for i = 1,2, where we have substituted (|7.2|) . and Qi(gj,gj>) is the random variable 



(7.7) 



Qi(9j,9j>) ■ = 
We treat the cases i = 1, 2 separately. 



£ (4 -r)Re(& 
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Proof of (|7.7|) when % = 1. For the contribution of the large frequencies E\ we will take 
absolute values everywhere, which is fairly crude but conveys the major advantage that we 
will be able to easily eliminate the supremum in gj. Note that since 



we see that this case is vacuous unless 

2^+2 Ctf? log 2 N 

or in other words, 



2 j - j > 



'N 
C log 2 iV 



We then use the crude bound 

\9j(k)\ < |billx/v / iV<2^ 2 /v^V 
and the triangle inequality to conclude 

2-7+2 

ip Qx{9j,gf)< 

9j£Aj-A j+l v k( - Ei 

By definition of E\, we have 

2tx/N 



2 3+z v-^ 

sup Qi {gj , g f ) < -= \ (I k + r) | ( fc) | . 



E 2 ^l^'(^)l 



< 



CqH log 1 N 



E i»'(*)i s 



< 



C 2J log 2 AT 



C 2ilog 2 iV' 

Writing + r = (/& — r) + 2r, we conclude that 



2 i+2 



Si 6 A 



sup Qi , g f ) < -= E ( h ~ t) \gf (k) \ + 2 



AT 



(7.8 



and hence to prove ()7.7j) when i = 1, it would suffice (if Co is chosen sufficiently large) to 
show that 



23- 



E^-^wi > 



co 



It thus suffices to show 



log 2 iV 



r = O ( exp 



C 



1 + 2 2 i' 



mlog N 



E(4-r)o(A;) 



> 7 = O ( exp 



1 + 2 2 i 



7 • m log 2 N 



where a(fc) := \gj>{k)\ and 7 := „^ V T„ . Recall that 

||a(fc)||^ (£l) = 0(|| 5/ || x /ViV) = 0(23' /VN) 
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and 

\\ a ( k )\\e 2 (E 1 ) < \\9j'\\e 2 = \\Sf 
We apply Lemma l6.6l and obtain 



^(J fc -T)a(fc) 



> 7 = O ( exp 



0(1). 

c 7 2 



4t + jZ>'/VN 



Using (|7.8I) . we see that -y2i'/y/N > c • r for some absolute constant c > 0, and conclude 
that 

C • c • tN 



Y^{h-r)a{k) 



> 7 = O exp 



2?+i' log 2 iV 



Taking logarithms, we deduce that this contribution will be acceptable if 

1 ^mlog 2 iV<C. C °- TN 



1 + 22i' " - ~ 23+J' log 2 N 

which holds (with some room to spare) thanks to (|7.2|) . 

Proof of (|7,7|) when i = 2. For the contribution of the small frequencies E 2 we use (|7.5|) 
and the union bound, and reduce to showing that 

c \ „/ ( C 



P[Q2( 9j ,g f )> log2N 



t = O exp 



1 + 2 2 J 



m log 2 N) 



(7.9) 



for any gj € Aj — A/+1- 



Fix <7j, and set a(k) := Ke(gj(k)gj>(k)); thus 



Q2{9j,9j>) = J2( Ik ~ T ) a ( k )- 

keE 2 



By definition of E 2 , we have 



Hk)\\ loo(E2) < o( 



Vlog 2 N \\gA\x . 



N 



N 



2 2 n g 2 N^ 



while from Holder and Plancherel 

\W(k)\\e 2 (E 2 ) < \\9j'\\i 2 \\9j\U 
We can apply Lemma 16.61 to conclude 

co 



j'\\e 2 \\9j\\x 



N 



2' 



P Qi{9j,9j') > 



\og 2 N 



C-^-r 2 



t = 0(exp(- 



log 4 N 



t[2V/N] + r[c log 2 N2 2 i / log 2 NN) 



)) 



C-log- 4 iV-rAr ^ 
0(exp( ^- )). 



Taking logarithms, we thus see that this contribution will be acceptable if 



1 



1 + 2 2 J 



■m\og 2 N < C ■ 



tN 



2 2 i log 4 N 



which holds thanks to (|7.2[) . This concludes the proof of Lemma l4.3l (assuming Proposition 
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8 'Universal' Encoding 



Our results interact with the agenda of coding theory. In fact, one can think of the process 
of taking random measurements as a kind of universal coding strategy that we explain 
below. In a nutshell, consider an encoder /decoder pair which would operate roughly as 
follows: 

• The Encoder and the Decoder share a collection of random vectors (Xk) where the 
Xfc's are independent Gaussian vectors with standard normal entries. In practice, we 
can imagine that the encoder would send the seed of a random generator so that the 
decoder would be able to reconstruct those 'pseudo-random' vectors. 

• Encoder. To encode a discrete signal /, the encoder simply calculates the coefficients 
Vk — (/) Xk) and quantizes the vector y. 

• Decoder. The decoder then receives the quantized values and reconstructs a signal by 
solving the linear program 1)1. lOj) . 

This encoding/decoding scheme is of course very different from those commonly discussed 
in the literature of information theory. In this scheme, the encoder would not try to know 
anything about the signal, nor would exploit any special structure of the signal; it would 
blindly correlate the signal with noise and quantize the output — effectively doing very little 
work. In other words, the encoder would treat each signal in exactly the same way, hence 
the name "universal encoding." There are several aspects of such a strategy which seem 
worth exploring: 

• Robustness. A fundamental problem with most existing coding strategies is their 
fragility vis a vis bit-loss. Take JPEG 2000, the current digital still-picture compres- 
sion standard, for example. All the bits in JPEG 2000 do not have the same value 
and if important bits are missing (e.g. because of packet loss), then there is simply 
no way the information can be retrieved accurately. 

The situation is very different when one is using the scheme suggested above. Suppose 
for example that with a little more than K coefficients one achieves the distortion 
obeying the power-law 

||/-/«|| 2 <1/K. (8.1) 

(This would correspond to the situation where our objects are bounded in Thus 
receiving a little more than K random coefficients essentially allows to reconstruct a 
signal as precisely as if one knew the K largest coefficients. 

Now suppose that in each packet of information, we have both encoded the (quantized) 
value of the coefficients y& but also the label of the corresponding coefficients k. 
Consider now a situation in which half of the information is lost in the sense that 
only half of the coefficients are actually received. What is the accuracy of the decoded 
message f^y "? This essentially corresponds to reducing the number of randomly 
sampled coefficients by a factor of two, and so by 1)8.1)1 we see that the distortion 
would obey 

||/-/« 0% || 2 <2/^ (8.2) 
and, therefore, losses would have minimal effect. 
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• Security. Suppose that someone would intercept the message. Then he/she would 
not be able to decode the message because he/she would not know in which ran- 
dom basis the coefficients are expressed. (In practice, in the case where one would 
exchange the seed of a random generator, one could imagine protecting it with stan- 
dard technologies such as RSA. Thus this scheme can be viewed as a variant of the 
standard stream cipher, based on applying a XOR operation between the plain text 
and a pseudorandom keystream, but with the advantage of robustness.) 

• Cost Efficiency. Nearly all coding scenarios work roughly as follows: we acquire a 
large number of measurements about an object of interest, which we then encode. 
This encoding process effectively discards most of the measured data so that only a 
fraction of the measurement is being transmitted. For concreteness, consider JPEG 
2000, a prototype of a transform coder. We acquire a large number N of sample 
values of a digital image /. The encoder then computes all the iV wavelet coefficients 
of /, and quantizes only the B <C N largest, say. Hence only a very small fraction of 
the wavelet coefficients of / are actually transmitted. 

In stark contrast, our encoder makes measurements that are immediately used. Sup- 
pose we could design sensors which could actually measure the correlations (f,Xk). 
Then not only the decoded object would be nearly as good (in the ^-distance) as 
that obtained by knowing all the wavelet coefficients and selecting the largest (it is 
expected that the ^-reconstruction is well-behaved vis a vis quantization), but we 
would effectively encode all the measured coefficients and thus, we would not discard 
any data available about / (except for the quantization). 

Even if one could make all of this practical, a fundamental question remains: is this an 
efficient strategy? That is, for a class of interesting signals, e.g. a class of digital images 
with bounded variations, would it be possible to adapt the ideas presented in this paper 
to show that this scheme does not use many more bits than what is considered necessary? 
In other words, it appears interesting to subject this compression scheme to a rigorous 
information theoretic analysis. This analysis would need to address 1) how one would want 
to efficiently quantize the values of the coefficients (/, Xk) and 2) how the quantization 
quantitatively affects the precision of the reconstructed signal. 

9 Discussion 
9.1 Robustness 

To be widely applicable, we need noise-aware variants of the ideas presented in this paper 
which are robust against the effects of quantization, measurement noise and modeling error, 
as no real-world sensor can make perfectly accurate measurements. We view these issues 
as important research topics. For example, suppose that the measurements y^ = (/, ipk) 
are rounded up to the nearest multiple of q, say, so that the available information is of the 
form yi with — q/2 < y q k — y^ < q/2. Then we would like to know whether the solution 
to 1)1.10)1 or better, of the variant 

min ||#||4, subject to \\F n g - y q \\ ioo < q/2 

9 
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still obeys error estimates such as those introduced in Theorem ll.4l Our analysis seems to be 
amenable to this situation and work in progress shows that the quality of the reconstruction 
degrades gracefully as q increases. Precise quantitative answers would help establishing the 
information theoretic properties of the scheme introduced in Section |HJ 

9.2 Connections with other works 

Our results are connected with very recent work of A. Gilbert, S. Muthukrishnan, and 
M. Strauss [31], |33| . In this work, one considers a discrete signal of length N which 
one would like to represent as a sparse superposition of sinusoids. In [32], the authors 
develop a randomized algorithm that essentially samples the signal / in the time domain 
0(i? 2 poly(log N)) times (poly(logiV) denotes a polynomial term in logiV) and returns a 
vector of approximate Fourier coefficients. They show that under certain conditions, this 
vector gives, with positive probability, an approximation to the discrete Fourier transform 
of / which is almost as good as that obtained by keeping the -B-largest entries of the discrete 
Fourier transform of /. In |3Hj, the algorithm was refined so that (1) only 0(Spoly(log N)) 
samples are needed and (2) so that the algorithm runs in 0(Spoly(log N)) time which truly 
is a remarkable feat. To achieve this gain, however, one has to sample the signal on highly 
structured random grids. 

Our approach is different in several aspects. First and foremost, we are given a fixed set of 
nonadaptive measurements. In other words, the way in which we stated the problem does 
not give us the 'luxury' of adaptively sampling the signals as in [33]. In this context, it is 
unclear how the methodology presented in |32| I33j would allow reconstructing the signal 
/ from 0(i?poly(log N)) arbitrary sampled values. In contrast, our results guarantee that 
an accurate reconstruction is possible for nearly all possible measurements sets taken from 
ensembles obeying UUP and ERP. Second, the methodology there essentially concerns 
the recovery of spiky signals from frequency samples and do not address other setups. Yet, 
there certainly is a similar flavor in the statements of their results. Of special interest is 
whether some of the ideas developed by this group of researchers might be fruitful to attack 
problems such as those discussed in this article. 

While finishing the write-up of this paper, we became aware of very recent and independent 
work by David Donoho on a similar project |24j . In that paper which appeared one month 
before ours, Donoho essentially proves Theorem 1 1.1 1 for Gaussian ensembles. He also shows 
that if a measurement matrix obeys 3 conditions (CS1-CS3), then one can obtain the 
estimate (jl.lljl . There is some overlap in methods, in particular the estimates of Szarek 
52 on the condition numbers of random matrices (CS1) also play a key role in those papers, 
but there is also a greater reliance in those papers on further facts from high-dimensional 
geometry, in particular in understanding the shape of random sections of the i\ ball (CS2- 
CS3). Our proofs are completely different in style and approach, and most of our claims are 
different. While |24j only derives results for the Gaussian ensemble, this paper establishes 
that other types of ensembles such as the binary and the Fourier ensembles and even 
arbitrary measurement /synthesis pairs will work as well. This is important because this 
shows that concrete sensing mechanisms may be used in concrete applications. 

In a companion 12 to this paper we actually improve on the results presented here and 
show that Theorem II .41 holds for general measurement ensembles obeying the UUP. The 
implication for the Gaussian ensemble is that the recovery holds with an error in (|1. \\\ of 
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size at most a constant times {K/\og{N/K)) 



10 Appendix: Proof of entropy estimate 



In this section we prove Proposition 17.21 The material here is to a large extent borrowed 
from that in 0, 0, @3]. 

The entropy of the unit ball of a Hilbert space can be estimated using the dual Sudakov 
inequality of Pajor and Tomczak-Jaegerman |44| (See [H], [13] f° r a short "volume packing" 
proof, and [13] for further discussion): 

Lemma 10.1 \44Jj Let H be a n-dimensional Hilbert space with norm \\ ■ \\h, and let Bh 
be the associated unit ball. Let e\, . . . ,e n be an orthonormal basis of the Hilbert space H, 
and let Zi,...,Z n ~ iV(0, 1) be i.i.d. standard Gaussian random variables. Let \\ ■ ||y be 
any other norm on C n . Then we have 

n 

£(B H ,B Y ,r) <Cr~ 2 ■ViWY.Z^Wv) 2 

3=1 

where C is an absolute constant (independent ofn). 



To apply this Lemma, we need to estimate the X norm of certain randomized signs. For- 
tunately, this is easily accomplished: 

Lemma 10.2 Let f £ ^{^n) and Z(t), t £ Z^r, be i.i.d. standard Gaussian random 
variables. Then 



mZf\\x)<C-y/te£N 

The same statement holds if the Z's are i.i.d. Bernoulli symmetric random variables 
(Z(t) = ±1 with equal probability). 

Proof Let us normalize \\f\\e 2 = !• For any A > 0, we have 



P(\\Zf\\ x >\) 



Z{t)f{t)e- 2 ™ tk ' N 

tez N 



> A for some k £ 



< N sup P 



E 



Z(t)f(t)e- 27Titk/N 



> A 



If the Z(t) are i.i.d. normalized Gaussians, then for each fixed k, ^teZjv Z(t)f(t)e 2mtk / N 
is a Gaussian with mean zero and standard deviation II f II/, = 1. Hence 



P(||^/||x>A)<C-iV. e -A 2 /2. 

Combining this with the trivial bound P(||Z/||x > A) < 1 and then integrating in A gives 
the result. The claim for i.i.d. Bernoulli variables is similar but uses Hoeffding's inequality; 
we omit the standard details. ■ 



Combining this lemma with Lemma llU.H we immediately obtain 
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Corollary 10.3 Let E be a non-empty subset o/^^jv)/ note that ^{E) is both a Hilbert 
space (with the usual Hilbert space structure), as well as a normed vector space with the X 
norm. For all r > 0, we have 

£(B e2(E) ,B x ,r) < Cr- 2 ■ \E\ - log N. 

Now we turn to the set U m introduced in the preceding section. Since the number of sets 
E of cardinality m is ( ) < N m , we have the crude bound 

N(U rn ,B x ,r) <N m sup N{£,B x ,r) 

ECZ N ,\E\=m 

and hence by Corollary I1U. 31 

£(U m ,B x ,r) < C{l + r- 2 )m\ogN. (10.1) 

This already establishes (|7.4|) in the range C~ 1 < r < C\/log N. However, this bound is 
quite poor when r is large. For instance, when r > m 1 / 2 we have 

£(U m , B x ,r) = (10.2) 

since we have < m 1 / 2 whenever \\f\\e 2 < 1 an d l su PP(/)| < m - in the regime 

1 <C r < m 1 / 2 we can use the following support reduction trick of Bourgain to obtain a 
better bound: 



Lemma 10.4 If r > CyTogiV and m>C, then 



N(U m ,B x ,r) < N(U m/2+CV¥ii ,B x , f _ T ^ i — -Cy/b£W). 



Proof Let / & U m and E := supp(/), thus \\f\\e 2 < 1 and \E\ < m. Let a(t) = ±1 be 
i.i.d. Bernoulli symmetric variables. We write / = a f + (1 — o~)f. From Lemma llO. 21 for 
Bernoulli variables we have 

E(||<7/||x) <Cy^g~N 

and hence by Markov's inequality 

P(||a/||x>CyioiiV)<i- 
for a suitable absolute constant C. Also observe that 

ii(i - *)f\\i = E 4 i/^i 2 = 2 \\f\\l - 2l>(*)i/(*)i 2 > 

tEE;or(t)=-l teE 
and hence by Hoeffding's or Khintchine's inequalities and the normalization \\f\\e 2 < 1 

P([|(l-a)/|| /a >v^+a/v^)<^ 
for a suitable absolute constant C. In a similar spirit, we have 

supp((l - a)f) = \{te supp(/) : a(t) = = ~supp(/) - i £ 
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and hence 

nrn _ _ I 

P(supp((l - a)f) >- + Cy/m) < — 

for a suitable absolute constant C. Combining all these estimates together, we see that 
there exists a deterministic choice of signs a{t) = ±1 (depending on / and E) such that 

hf\\x < CTioiiV; ||(1- «r)/|| /a < y/2 + c/V^; supp((l - °)f) < j + c^R. 

In particular, / is within Cy/TogN (in X norm) from (v2 + C/y/m) ■ U m /2+c^/m- We thus 
have 

iV(C/ m , 5 X) r)< N((V2 + C/y/m)U m/ ^ CVffi , B ^ r - Cy/to£N) 
and the claim follows. ■ 

Iterating this lemma roughly log^ ^ g N times to reduce m and r, and then applying 
(|1U.1|) once r becomes comparable with -y/log N, we obtain 

S(U m ,B x ,r) < Cr~ 2 m (log iV) 2 whenever C A/log iV < r < m 1/2 , 

which (together with (|Tir^]l ) yields (fTij) for all r > CVlog N. 

It remains to address the case of small r, say iV~ 2 < r < 1/2. A simple covering argument 
(see (SI Lemma 2.7]; the basic point is that B^ 2 i E \ can be covered by 0(r~ c \ E \) translates 
of r • Bg 2 (E) ) gives the general inequality 

£(B i2{E) ,B x ,r) < C\E\ log J + S(B HE) ,B X , 1) 

for < r < 1/2, and hence by Corollary I1U.3I 

£(B HE) ,B x ,r) < C\E\log^ + C\E\logN. 

Arguing as in the proof of (jlO.lj) we thus have 

8{U m ,B x ,r) < Cm(logN + \og-), 

r 

which gives Q7.4j) in the range iV~ 2 < r < 1/2. This completes the proof of Proposition 17.21 
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